2026-03-01 β Provider Arsenal & Memory Upgrade
Session Overview
Major expansion of AI provider coverage, native memory system upgrade, and planning for sub-agent architecture. Shift from "get it working" to "get it optimized."
New Provider Keys Configured
All tested, saved to both the-ford-estate.env and ~/.openclaw/.env:
- Mistral (MISTRAL_API_KEY) β 56 models, free tier, 60 RPM
- DeepSeek (DEEPSEEK_API_KEY) β $0 balance (needs top-up)
- Groq (GROQ_API_KEY) β free tier, very fast, low latency
- DashScope China (ALIBABA_CLOUD_API_CN) β 161 models but models need activation
- NVIDIA NIM (NVIDIA_API_KEY) β 40+ S+ tier models FREE
- Cerebras (CEREBRAS_API_KEY) β fastest inference on earth, free
- Codestral (CODESTRAL_API_KEY) β free coding-only Mistral models
- Together AI (TOGETHER_API_KEY) β $25 free credit
- SambaNova (SAMBANOVA_API_KEY) β free tier, fast inference
- HuggingFace (HF_TOKEN) β valid, endpoint routing needs config
Provider Status
| Provider | Status | Action Needed |
|---|---|---|
| DeepSeek | β οΈ $0 balance | Top up at platform.deepseek.com |
| xAI/Grok | β οΈ No credits | Purchase at console.x.ai |
| Z.AI | β οΈ Depleted | Top up at open.bigmodel.cn |
| DashScope | β οΈ Models need activation | Activate in Alibaba console |
| ElevenLabs | β οΈ Quota exhausted | Needs plan upgrade |
Custom Providers Configured
Added to models.providers:
- nvidia β integrate.api.nvidia.com/v1 (6 models: Kimi K2.5, GLM 5, DeepSeek V3.2, Qwen3 Coder, MiniMax M2.1, Llama 3.3)
- deepseek β api.deepseek.com (2 models: chat, reasoner)
- dashscope β dashscope.aliyuncs.com (6 Qwen/DeepSeek models)
- codestral β codestral.mistral.ai (codestral-latest)
- sambanova β api.sambanova.ai (Llama 3.3, DeepSeek V3)
- together β api.together.xyz (3 models)
Model Strategy Defined
Tier 0 (Free): NVIDIA NIM, Cerebras, Groq, Codestral, SambaNova, OpenRouter Tier 1 (Cheap): Gemini Flash, Mistral Small, Together AI Tier 2 (Moderate): GPT-4o-mini, Mistral Medium, Gemini Pro Tier 3 (Premium): GPT-4o, o3, Opus (last resort)
Target: 80% of usage at $0, 15% pennies (Gemini Flash), 5% dollars (Opus only when needed)
Per-Agent Model Assignments
| Agent | Primary | Fallback 1 | Fallback 2 | Complex |
|---|---|---|---|---|
| Ada | nvidia/kimi-k2.5 | cerebras/gpt-oss-120b | google/gemini-2.5-flash | Opus subagent |
| K2 | nvidia/deepseek-v3.2 | codestral/codestral-latest | cerebras/qwen3-235b | Opus subagent |
| Cora | nvidia/kimi-k2.5 | google/gemini-2.5-flash | mistral/mistral-medium | β |
| Winston | google/gemini-2.5-flash | nvidia/kimi-k2.5 | groq/llama-3.3-70b | β |
| Synergy | google/gemini-2.5-flash | nvidia/kimi-k2.5 | groq/llama-3.3-70b | β |
Memory System Upgrade (Native)
| Feature | Before | After |
|---|---|---|
| Embeddings | OpenAI | Gemini ($20x cheaper) |
| Search | Vector only | Hybrid BM25 + Vector |
| Diversity | Redundant hits | MMR re-ranking (Ξ»=0.7) |
| Recency | Flat | Temporal decay (30d half-life) |
| Scope | Workspace only | +K2/Cora/Winston/Synergy/ada-lab |
| Sessions | Not searchable | Full transcript indexing |
| Caching | Off | 50K entries |
All configured via openclaw config set, validated, gateway restarted cleanly.
Model Scanner Cron
- 5am daily job configured
- Scans OpenAI, Mistral, Groq, DeepSeek, DashScope for model changes
- Tracks rate limit changes
- First baselines captured: 120/56/20/2/161 models respectively
- Reports findings to morning briefing
Free Provider Reference
Created comprehensive guide at memory/free-provider-reference.md:
- 19 providers with free tiers identified
- Priority signups: NVIDIA NIM, Cerebras, SambaNova, Codestral, HuggingFace, Cohere
- Rate limits documented
- Signup URLs compiled
Current Provider Arsenal
Working (11): OpenAI, Anthropic, Gemini, Mistral, Groq, OpenRouter, NVIDIA NIM, Cerebras, Codestral, Together, SambaNova Known Free (29 on OpenRouter): GPT-OSS-120B, Llama 3.3 70B, Qwen3 Coder, Hermes 405B, etc. Total accessible: 200+ models
Sub-Agent Architecture Research (Pending)
Problem: Domain-specific agents (K2, Cora, Winston, Synergy) all need similar functions (document retrieval, deep research) but currently no shared sub-agent layer.
Key Questions: 1. Shared service sub-agents vs domain-specific? 2. How to handle context passing between parent β sub-agent? 3. Tool access patterns (read-only vs read-write)? 4. Lifecycle: spawn β task β result β dispose vs persistent workers?
Sources to research: - GitHub starred repos (personal) - Reddit communities (r/MachineLearning, r/LocalLLaMA, r/OpenClaw, r/AutoGPT, r/CrewAI, r/ClaudeAI, r/ChatGPTCoding) - Free-coding-models library architecture - Agent-team-orchestration skill on ClawHub
Next Actions
- [ ] Research sub-agent patterns from starred GitHub repos
- [ ] Identify Reddit communities for daily monitoring
- [ ] Define shared service sub-agent taxonomy
- [ ] Top up DeepSeek ($5-10) for cheap reasoning tier
- [ ] Configure HuggingFace endpoint routing
- [ ] Set Telegram profile pics via @BotFather
Morning Update β 2026-03-01 09:30 EST
Sub-Agent Architecture Implementation
- Created 4 disposable sub-agent workspaces: research, rag, coding, analysis
- Minimal SOUL.md files (functional, no personality)
- Spawn guide created at
agents/SPAWN.md - Test research spawn in progress
Key Insight
2026 multi-agent trend: Specialized agents in sequence (extractor β analyzer β checker), not "one mega-bot." Frameworks: LangGraph (stateful), AutoGen, Conductor, Swarm.
Reddit Communities Ready
| Community | Priority | Agent |
|---|---|---|
| r/OpenClaw | CRITICAL | Ada |
| r/CrewAI | HIGH | Ada |
| r/LocalLLaMA | HIGH | K2 |
| r/homelab | HIGH | K2 |
| r/realestate | HIGH | Cora |
| r/ClaudeAI | HIGH | K2/Ada |
| r/selfhosted | HIGH | K2 |
| r/Proxmox | HIGH | K2 |
| r/MachineLearning | MEDIUM | Ada |
| r/AutoGPT | MEDIUM | Ada |
Next
- Set up Reddit monitoring crons (daily scanning)
- Implement shared service sub-agents
- Review GitHub starred repos (top 15 identified)